17:41
2026-06-17
lesswrong.com
large-language-models
Several frontier models are substantially prefill aware
Researchers at UK AISI found that several frontier language models exhibit prefill awareness, the ability to detect tampered assistant-side content in their message history. This capability could confβ¦